Deep Semantic Role Labeling

语义角色标记深度模型

论文：Deep Semantic Role Labeling: What Works and What’s Next

训练数据：CoNLL 2003

代码：Deep SRL

模型结构

相比较于CNN-BiLSTM-CRF模型，deep-srl简单多了，但是效果并没有打折扣

模型主要改良LSTM，加入Recurrent Dropout和Highway

LSTM公式：

在LSTM基础上加入Highway：

在lstm cell的输出后连接Recurrent Dropout

最后接上全连接层输出每个tag的概率

结果

标签

{'<pad>': 0, 'B-MISC': 1, 'B-ORG': 2, 'I-MISC': 3, 'I-LOC': 4, 'I-PER': 5, 'B-LOC': 6, 'O': 7, 'I-ORG': 8}

结果1

样本: neuchatel @ st gallen @

预测: I-ORG O I-ORG I-ORG O

标注: I-ORG O I-ORG I-ORG O

结果2

样本: kankkunen has set an astonishing pace for a driver who has not rallied for three months .

预测: I-PER O O O I-PER O O O O O O O O O O O O

标注: I-PER O O O O O O O O O O O O O O O O

训练结果

代码

lstm改进操作

多两个门，分别对应Highway和上一层输出的线性转换
Recurrent dropout使用伯努利分布
每一层lstm的输出reverse作为下一层的输入

class HwLSTMCell(nn.Module):
    def __init__(self, isz, hsz, dropout_prob, is_cuda):
        super().__init__()

        self.hsz = hsz

        self.w_ih = nn.Parameter(torch.Tensor(6 * hsz, isz))
        self.w_hh = nn.Parameter(torch.Tensor(5 * hsz, hsz))
        self.b_ih = nn.Parameter(torch.Tensor(6 * hsz))

        self.rdropout = RnnDropout(dropout_prob, hsz, is_cuda)

        self.reset_parameters()

    def reset_parameters(self):
        stdv = 1.0 / math.sqrt(self.hsz)
        for weight in self.parameters():
            nn.init.uniform_(weight, -stdv, stdv)

    def forward(self, input, hidden=None):
        if hidden is None:
            hidden = input.new_zeros(input.size(0), self.hsz)
            hidden = (hidden, hidden)

        hx, cx = hidden

        input = F.linear(input, self.w_ih, self.b_ih)
        gates = F.linear(hx, self.w_hh) + input[..., :-self.hsz]

        in_gate, forget_gate, cell_gate, out_gate, r_gate = gates.chunk(5, 1)
        in_gate, forget_gate, out_gate, r_gate = map(
            torch.sigmoid, [in_gate, forget_gate, out_gate, r_gate])
        cell_gate = torch.tanh(cell_gate)
        k = input[..., -self.hsz:]

        cy = forget_gate * cx + in_gate * cell_gate
        hy = r_gate * out_gate * F.tanh(cy) + (1. - r_gate) * k

        if self.training:
            hy = self.rdropout(hy)

        return hy, cy

class HwLSTMlayer(nn.Module):
    def __init__(self, isz, hsz, dropout_prob, is_cuda):
        super().__init__()

        self.cell = HwLSTMCell(isz, hsz, dropout_prob, is_cuda)

    def forward(self, input, reverse=True):
        output, hidden = [], None
        for i in range(len(input)):
            hidden = self.cell(input[i], hidden)
            output.append(hidden[0])

        if reverse:
            output.reverse()

        return torch.stack(output)